{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Neo4j's native GraphRAG SDK with AG2 agents for Question & Answering\n",
    "\n",
    "AG2 provides GraphRAG integration through agent capabilities. This is an example utilizing the integration of Neo4j's native GraphRAG SDK.\n",
    "The Neo4j native query engine enables the construction of a knowledge graph from a single text or PDF file. Additionally, you can define custom entities, relationships, or schemas to guide the graph-building process. Once created, you can integrate the RAG capabilities into AG2 agents to query the knowledge graph effectively. \n",
    "\n",
    "````{=mdx}\n",
    ":::info Requirements\n",
    "To install the neo4j GraphRAG SDK with OpenAI LLM\n",
    "\n",
    "```bash\n",
    "sudo apt-get install graphviz graphviz-dev\n",
    "pip install pygraphviz\n",
    "pip install \"neo4j-graphrag[openai, experimental]\"\n",
    "```\n",
    ":::\n",
    "````\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set Configuration and OpenAI API Key\n",
    "\n",
    "By default, in order to use OpenAI LLM with Neo4j you need to have an OpenAI key in your environment variable `OPENAI_API_KEY`.\n",
    "\n",
    "You can utilize an OAI_CONFIG_LIST file and extract the OpenAI API key and put it in the environment, as will be shown in the following cell.\n",
    "\n",
    "Alternatively, you can load the environment variable yourself.\n",
    "\n",
    "````{=mdx}\n",
    ":::tip\n",
    "Learn more about configuring LLMs for agents [here](https://docs.ag2.ai/latest/docs/user-guide/basic-concepts/llm-configuration).\n",
    ":::\n",
    "````\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import autogen\n",
    "\n",
    "llm_config = autogen.LLMConfig.from_json(path=\"OAI_CONFIG_LIST\")\n",
    "\n",
    "# Put the OpenAI API key into the environment\n",
    "os.environ[\"OPENAI_API_KEY\"] = llm_config.config_list[0][\"api_key\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is needed to allow nested asyncio calls for Neo4j in Jupyter\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set up LLM models\n",
    "\n",
    "**Important**  \n",
    "- **Default Models**:\n",
    "    - **Knowledge Graph Construction** OpenAI's `gpt-5` with `json_object` output `temperature=0.0`.\n",
    "    - **Question Answering**: OpenAI's `gpt-5` with `temperature=0.0`.\n",
    "    - **Embedding**: OpenAI's `text-embedding-3-large`. You need to provide its dimension for the query engine later.\n",
    "\n",
    "- **Customization**:\n",
    "  You can change these defaults by setting the following parameters on the `Neo4jNativeGraphQueryEngine`:\n",
    "    - `llm`: Specify a LLM instance with a llm you like for graph construction, it **must support json format response**\n",
    "    - `query_llm`: Specify a LLM instance with a llm you like for querying. **Don't use json format response.**\n",
    "    - `embedding`: Specify a Embedder instance with a embedding model.\n",
    "\n",
    "Learn more about configuring other LLM providers for agents [here](https://github.com/neo4j/neo4j-graphrag-python?tab=readme-ov-file#optional-dependencies).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from neo4j_graphrag.embeddings import OpenAIEmbeddings\n",
    "from neo4j_graphrag.llm.openai_llm import OpenAILLM\n",
    "\n",
    "llm = OpenAILLM(\n",
    "    model_name=\"gpt-5\",\n",
    "    model_params={\n",
    "        \"response_format\": {\"type\": \"json_object\"},  # Json format response is required for the LLM\n",
    "        \"temperature\": 0,\n",
    "    },\n",
    ")\n",
    "\n",
    "query_llm = OpenAILLM(\n",
    "    model_name=\"gpt-5\",\n",
    "    model_params={\"temperature\": 0},  # Don't use json format response for the query LLM\n",
    ")\n",
    "\n",
    "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports\n",
    "from autogen import ConversableAgent, UserProxyAgent\n",
    "from autogen.agentchat.contrib.graph_rag.document import Document, DocumentType\n",
    "from autogen.agentchat.contrib.graph_rag.neo4j_native_graph_query_engine import Neo4jNativeGraphQueryEngine\n",
    "from autogen.agentchat.contrib.graph_rag.neo4j_native_graph_rag_capability import Neo4jNativeGraphCapability"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a Knowledge Graph with Your Own Data\n",
    "\n",
    "**Note:** You need to have a Neo4j database running. If you are running one in a Docker container, please ensure your Docker network is setup to allow access to it. \n",
    "\n",
    "In this example, the Neo4j endpoint is set to host=\"bolt://172.17.0.3\" and port=7687, please adjust accordingly. For how to spin up a Neo4j with Docker, you can refer to [this](https://docs.llamaindex.ai/en/stable/examples/property_graph/property_graph_neo4j/#:~:text=stores%2Dneo4j-,Docker%20Setup,%C2%B6,-To%20launch%20Neo4j)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A Simple Example\n",
    "\n",
    "In this example, the graph schema is auto-generated. Entities and relationships are created as they fit into the data\n",
    "\n",
    "Neo4j GraphRAG SDK supports single document of 2 input types -- txt and pdf (images will be skipped). \n",
    "\n",
    "We start by creating a Neo4j knowledge graph with a sample text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "# To use text data, you need to:\n",
    "# 1. Specify the type as TEXT\n",
    "# 2. Pass the path to the text file\n",
    "\n",
    "input_path = \"../test/agentchat/contrib/graph_rag/BUZZ_Employee_Handbook.txt\"\n",
    "input_document = [Document(doctype=DocumentType.TEXT, path_or_url=input_path)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we need to use the query engine to initialize the database. It performs the follows steps:\n",
    "\n",
    "1. Clears the existing database.\n",
    "2. Extracts graph nodes and relationships from the input data to build a knowledge graph.\n",
    "3. Creates a vector index for efficient retrieval.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = Neo4jNativeGraphQueryEngine(\n",
    "    host=\"bolt://172.17.0.3\",  # Change\n",
    "    port=7687,  # if needed\n",
    "    username=\"neo4j\",  # Change if you reset username\n",
    "    password=\"password\",  # Change if you reset password\n",
    "    llm=llm,  # change to the LLM model you want to use\n",
    "    embeddings=embeddings,  # change to the embeddings model you want to use\n",
    "    query_llm=query_llm,  # change to the query LLM model you want to use\n",
    "    embedding_dimension=3072,  # must match the dimension of the embeddings model\n",
    ")\n",
    "\n",
    "# initialize the database (it will delete any pre-existing data)\n",
    "query_engine.init_db(input_document)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add capability to a ConversableAgent and query them\n",
    "The rag capability enables the agent to perform local search on the knowledge graph using the vector index created in the previous step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a ConversableAgent (no LLM configuration)\n",
    "graph_rag_agent = ConversableAgent(\n",
    "    name=\"buzz_agent\",\n",
    "    human_input_mode=\"NEVER\",\n",
    ")\n",
    "\n",
    "# Associate the capability with the agent\n",
    "graph_rag_capability = Neo4jNativeGraphCapability(query_engine)\n",
    "graph_rag_capability.add_to_agent(graph_rag_agent)\n",
    "\n",
    "# Create a user proxy agent to converse with our RAG agent\n",
    "user_proxy = UserProxyAgent(\n",
    "    name=\"user_proxy\",\n",
    "    human_input_mode=\"ALWAYS\",\n",
    ")\n",
    "\n",
    "user_proxy.initiate_chat(graph_rag_agent, message=\"Who is the employer?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Revisit the example by defining custom entities, relations and schema\n",
    "\n",
    "By providing custom entities, relations and schema, you could guide the engine to create a graph that better extracts the structure within the data. Custom schema must use provided entities and relations.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Custom entities, relations and schema that fits the document\n",
    "\n",
    "entities = [\"EMPLOYEE\", \"EMPLOYER\", \"POLICY\", \"BENEFIT\", \"POSITION\", \"DEPARTMENT\", \"CONTRACT\", \"RESPONSIBILITY\"]\n",
    "relations = [\n",
    "    \"FOLLOWS\",\n",
    "    \"PROVIDES\",\n",
    "    \"APPLIES_TO\",\n",
    "    \"ASSIGNED_TO\",\n",
    "    \"PART_OF\",\n",
    "    \"REQUIRES\",\n",
    "    \"ENTITLED_TO\",\n",
    "    \"REPORTS_TO\",\n",
    "]\n",
    "\n",
    "potential_schema = [\n",
    "    (\"EMPLOYEE\", \"FOLLOWS\", \"POLICY\"),\n",
    "    (\"EMPLOYEE\", \"ASSIGNED_TO\", \"POSITION\"),\n",
    "    (\"EMPLOYEE\", \"REPORTS_TO\", \"DEPARTMENT\"),\n",
    "    (\"EMPLOYER\", \"PROVIDES\", \"BENEFIT\"),\n",
    "    (\"EMPLOYER\", \"REQUIRES\", \"RESPONSIBILITY\"),\n",
    "    (\"POLICY\", \"APPLIES_TO\", \"EMPLOYEE\"),\n",
    "    (\"POLICY\", \"APPLIES_TO\", \"CONTRACT\"),\n",
    "    (\"POLICY\", \"REQUIRES\", \"RESPONSIBILITY\"),\n",
    "    (\"BENEFIT\", \"ENTITLED_TO\", \"EMPLOYEE\"),\n",
    "    (\"POSITION\", \"PART_OF\", \"DEPARTMENT\"),\n",
    "    (\"POSITION\", \"ASSIGNED_TO\", \"EMPLOYEE\"),\n",
    "    (\"CONTRACT\", \"REQUIRES\", \"RESPONSIBILITY\"),\n",
    "    (\"CONTRACT\", \"APPLIES_TO\", \"EMPLOYEE\"),\n",
    "    (\"RESPONSIBILITY\", \"ASSIGNED_TO\", \"POSITION\"),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = Neo4jNativeGraphQueryEngine(\n",
    "    host=\"bolt://172.17.0.3\",  # Change\n",
    "    port=7687,  # if needed\n",
    "    username=\"neo4j\",  # Change if you reset username\n",
    "    password=\"password\",  # Change if you reset password\n",
    "    llm=llm,  # change to the LLM model you want to use\n",
    "    embeddings=embeddings,  # change to the embeddings model you want to use\n",
    "    query_llm=query_llm,  # change to the query LLM model you want to use\n",
    "    embedding_dimension=3072,  # must match the dimension of the embeddings model\n",
    "    entities=entities,\n",
    "    relations=relations,\n",
    "    potential_schema=potential_schema,\n",
    ")\n",
    "\n",
    "# initialize the database (it will delete any pre-existing data)\n",
    "query_engine.init_db(input_document)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Query the graph rag agent again\n",
    "If you inspect the database, you should find more nodes are created in the graph for each chunk of data this time. \n",
    "However, given the simple structure of input, the difference is not apparent in querying.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a ConversableAgent (no LLM configuration)\n",
    "graph_rag_agent = ConversableAgent(\n",
    "    name=\"buzz_agent\",\n",
    "    human_input_mode=\"NEVER\",\n",
    ")\n",
    "\n",
    "# Associate the capability with the agent\n",
    "graph_rag_capability = Neo4jNativeGraphCapability(query_engine)\n",
    "graph_rag_capability.add_to_agent(graph_rag_agent)\n",
    "\n",
    "# Create a user proxy agent to converse with our RAG agent\n",
    "user_proxy = UserProxyAgent(\n",
    "    name=\"user_proxy\",\n",
    "    human_input_mode=\"ALWAYS\",\n",
    ")\n",
    "\n",
    "user_proxy.initiate_chat(graph_rag_agent, message=\"Who is the employer?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Another example with pdf format input\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "\n",
    "# To use pdf data, you need to\n",
    "# 1. Specify the type as PDF\n",
    "# 2. Pass the path to the PDF file\n",
    "input_path = \"../test/agentchat/contrib/graph_rag/BUZZ_Employee_Handbook.pdf\"\n",
    "input_document = [Document(doctype=DocumentType.PDF, path_or_url=input_path)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = Neo4jNativeGraphQueryEngine(\n",
    "    host=\"bolt://172.17.0.3\",  # Change\n",
    "    port=7687,  # if needed\n",
    "    username=\"neo4j\",  # Change if you reset username\n",
    "    password=\"password\",  # Change if you reset password\n",
    "    llm=llm,  # change to the LLM model you want to use\n",
    "    embeddings=embeddings,  # change to the embeddings model you want to use\n",
    "    query_llm=query_llm,  # change to the query LLM model you want to use\n",
    "    embedding_dimension=3072,  # must match the dimension of the embeddings model\n",
    ")\n",
    "\n",
    "# initialize the database (it will delete any pre-existing data)\n",
    "query_engine.init_db(input_document)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a ConversableAgent (no LLM configuration)\n",
    "graph_rag_agent = ConversableAgent(\n",
    "    name=\"buzz_agent\",\n",
    "    human_input_mode=\"NEVER\",\n",
    ")\n",
    "\n",
    "# Associate the capability with the agent\n",
    "graph_rag_capability = Neo4jNativeGraphCapability(query_engine)\n",
    "graph_rag_capability.add_to_agent(graph_rag_agent)\n",
    "\n",
    "# Create a user proxy agent to converse with our RAG agent\n",
    "user_proxy = UserProxyAgent(\n",
    "    name=\"user_proxy\",\n",
    "    human_input_mode=\"ALWAYS\",\n",
    ")\n",
    "\n",
    "user_proxy.initiate_chat(graph_rag_agent, message=\"Who is the employer?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Incrementally add new documents to the existing knowledge graph.\n",
    "We add another document and build it into the existing graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_path = \"../test/agentchat/contrib/graph_rag/the_matrix.txt\"\n",
    "input_documents = [Document(doctype=DocumentType.TEXT, path_or_url=input_path)]\n",
    "\n",
    "_ = query_engine.add_records(input_documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's query the graph about both old and new documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a ConversableAgent (no LLM configuration)\n",
    "graph_rag_agent = ConversableAgent(\n",
    "    name=\"new_agent\",\n",
    "    human_input_mode=\"NEVER\",\n",
    ")\n",
    "\n",
    "# Associate the capability with the agent\n",
    "graph_rag_capability = Neo4jNativeGraphCapability(query_engine)\n",
    "graph_rag_capability.add_to_agent(graph_rag_agent)\n",
    "\n",
    "# Create a user proxy agent to converse with our RAG agent\n",
    "user_proxy = UserProxyAgent(\n",
    "    name=\"user_proxy\",\n",
    "    human_input_mode=\"ALWAYS\",\n",
    ")\n",
    "\n",
    "user_proxy.initiate_chat(graph_rag_agent, message=\"Who is the employer?\")"
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "Neo4j Native GraphRAG utilizes a knowledge graph and can be added as a capability to agents.",
   "tags": [
    "RAG"
   ]
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
